Czech-Sign Speech Corpus for Semantic Based Machine Translation
نویسندگان
چکیده
This paper describes progress in a development of the human-human dialogue corpus for machine translation of spoken language. We have chosen a semantically annotated corpus of phone calls to a train timetable information center. The phone calls consist of inquiries regarding their train traveler plans. Corpus dialogue act tags incorporate abstract semantic meaning. We have enriched a part of the corpus withsemantic meaning. We have enriched a part of the corpus with Sign Speech translation and we have proposed methods how to do automatic machine translation from Czech to Sign Speech using semantic annotation contained in the corpus.
منابع مشابه
Treebanks in Machine Translation
We present an approach using treebanks in machine translation. Our experiment in Czech-English machine translation is an attempt to develop a full machine translation system based on dependency trees (Dependency Based Machine Translation, DBMT). We use the following resources: Prague Dependency Treebank, a newly created Czech-English parallel corpus of Penn Treebank, English monolingual corpus,...
متن کاملSpanish Phoneme Classification by Means of a Hierarchy of Kohonen Self-Organizing Maps
Research Issues for the Next Generation Spoken Dialogue Systems p. 1 Data-Driven Analysis of Speech p. 10 Towards a Road Map for Machine Translation Research p. 19 The Prague Dependency Treebank: Crossing the Sentence Boundary p. 20 Text Tiered Tagging and Combined Language Models Classifiers p. 28 Syntactic Tagging p. 34 Information, Language, Corpus and Linguistics p. 39 Prague Dependency Tre...
متن کاملLanguage Resources for Spanish - Spanish Sign Language (LSE) translation
This paper describes the development of a Spanish-Spanish Sign Language (LSE) translation system. Firstly, it describes the first Spanish-Spanish Sign Language (LSE) parallel corpus focused on two specific domains: the renewal of the Identity Document and Driver’s License. This corpus includes more than 4,000 Spanish sentences (in these domains), their LSE translation and a video for each LSE s...
متن کاملCorpus based coreference resolution for Farsi text
"Coreference resolution" or "finding all expressions that refer to the same entity" in a text, is one of the important requirements in natural language processing. Two words are coreference when both refer to a single entity in the text or the real world. So the main task of coreference resolution systems is to identify terms that refer to a unique entity. A coreference resolution tool could be...
متن کاملQTLeap WSD/NED Corpora: Semantic Annotation of Parallel Corpora in Six Languages
This work presents parallel corpora automatically annotated with several NLP tools, including lemma and part-of-speech tagging, named-entity recognition and classification, named-entity disambiguation, word-sense disambiguation, and coreference. The corpora comprise both the well-known Europarl corpus and a domain-specific question-answer troubleshooting corpus on the IT domain. English is comm...
متن کامل